Spidel Tech Solutions, Inc. - Service Level Planning

Service Level Planning

The following categories of planning are required to varying degrees depending on the level of dependence on IT.

Installation Plan and Procedures

These plans describe the Intent to manage the installation of new equipment to maximize its performance and recovery ability.

Backup Plan and Procedures

Backup plans define the schedule and scope of system backups. It is important to assess the value of system information and how it may be recovered. The more often the information changes the more often it must be backed up. Software programs are not generally backed up as they do not (are not supposed to) change during normal operation.

Databases are constantly changing in a way that often requires entire disk snapshots to capture the extent of changes. An assessment of how far the backups should be removed from the site where information is used is necessary when determining the scope of system disruption that can be tolerated. Some companies place their backups under a far off mountain in order to tolerate natural and man-made disasters that affect whole regions of a country.

Backup Aging

Backup plans often include the idea of aging reusable media. The reasons for this include; backup media can be expensive and data often has a finite lifespan.

There are several problems with doing fixed frequency backups. For example what if you only do incremental backups. The problem here is that it is nearly impossible to restore a complete image.

Suppose on the other hand you only do weekly backups. The problem here is the effort to recover important transactions may be enormous.

Aged backups are not exactly a shell game but close. Suppose you are going to backup to usb drives where the backup just fits the available media. You use some media for daily incremental backups. There are 4 of these. You recycle them each week. Monday always goes on the Monday media. Then on Friday you do a full backup but call it the weekly backup. You keep 3 of these for a month. You recycle each of these on a particular week of the month. The last week of the month on Friday you do a monthly backup. You keep 2 of these. Then on the third month you do a quarterly backup instead of the monthly backup. Finally, on the 4th quarter you call this an annual backup and save it for the number of years that your business is required to keep records perhaps 7 years. So you have to keep 4+3+2+3+7 = 19 copies of media to keep an aged archive of backups. This beats 365 daily backups.

So what if you decide to risk reusing say 5 daily backups. The problem here is that problems in the data may not be revealed until some number of days past your last correct backup. This corruption has now propagated to all backups and you have no possibility to recover from existing backups.

Measurement Plan and Procedures

Plan for the collection and reporting of specific business measurements in order to verify Service levels. Business systems can generally be partitioned into the following subcategories. Computer hardware, Networking equipment, Applications and Repositories (database). One primary measurement is availability which is measured as Mean Time Between Failures (MTBF) and Mean Time To Repair (MTTR). This is a way to express system duty cycle by how much is it "Up" and how much is it "Down". Another principle measure is throughput, which is some quantum of transactions per second or complex transaction response time.

We Get What We Measure

When I was back in the School of Technology at ISU, my professors taught us that all measurement systems affect the system being measured. And sure enough when we examined the instruments at our disposal we found that just placing the measurement instrument into a circuit, caused the circuit to behave differently, if ever so slightly.

I noticed something a little different when I was a software group leader. It seemed that everything we tried to measure for software productivity resulted in an increase in the value of the measurement without necessarily increasing the productivity of the process.

For example; First there was measuring lines of code per project. Low and behold lines of code went up. But everyone knows you can add more lines of code without increasing functionality or about anything else except lines of code.

Then there was comments per line of code. Well the comment density improved but understandability did not.

My old programmer friend who shall remain nameless used to add “Please don’t tell my mom” to his code. And then there was the ever famous documentation line “Welcome to Hopkinsville Kentucky” See what I mean?

And then there was decision density to measure complexity and Shazam there were “If” statements all over the place. Quality Analysts are always measuring something, just to see if it is improving. But let us take care to measure real output and not just things that are easy to measure.

What are some real things to measure. How about good old MTBF and MTTR (mean time between failure and mean time to repair). Or how about, defects per 1000 received components at the dock. Of course there is the ever popular; application transactions per second and not so popular; customer detected design faults per month. And these just to name a few.

Reporting Plan and Procedures

How do client and service level provider communicate system service levels. What daily, weekly or monthly measures are collected and reported. What if the levels are not met. What escalates? This includes penalties and bonuses.

Contingency Planning

Every system has anticipated vulnerabilities. Each identified vulnerability should be addressed by a contingency plan.

For example; What will you do if a vital communications facility fails. This should be described in advance and the appropriate response determined when cool heads prevail.

Failover Plan and Procedures

Intent to deal with disaster. What will you do if your computer center roof collapses? What happens if the city block where your IT is centralized is wiped out by a tornado? What if terrorists detonate a nuclear warhead? Which of these scenarios are you prepared to plan for? Regardless of the scope of the disaster, how much time do you have to be back in business. If you run a gas station, then you might consider early retirement, but if your business is distributed across 3 states, others may be depending on you to get back in service quickly. So who do you contract with to provide a backup system if yours becomes unavailable for the foreseeable future? These are the questions that need answers in a failover plan.

Recovery Plan and Procedures

Plan to recover after initial dealing with disaster. Ok, so you leased a system in a neighboring state and have been shipping backups there for 2 years, and your building was flooded so you immediately moved operations to the failover site. Now what do you do? What is the plan to "get back to normal"? Do you have the building plans ready, if the insurance pays off?

Someone has said;

"Failure to plan is a plan to fail."

Planning is much easier before the calamity then afterward. It is an investment but it may just help you survive when everything comes apart at the seams.

Back | Next

Wikipedia Enrichment